AITopics | internal covariate shift

Collaborating Authors

internal covariate shift

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inherent Weight Normalization in Stochastic Neural Networks

Georgios Detorakis, Sourav Dutta, Abhishek Khanna, Matthew Jerry, Suman Datta, Emre Neftci

Neural Information Processing SystemsFeb-14-2026, 16:00:18 GMT

Neural Information Processing Systems http://nips.cc/

arxiv preprint arxiv, neural network, nsm, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.05)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

905056c1ac1dad141560467e0a99e1cf-Paper.pdf

Neural Information Processing SystemsFeb-13-2026, 17:55:46 GMT

batchnorm, gradient, internal covariate shift, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BatchNormalizationProvablyAvoidsRankCollapse forRandomlyInitialisedDeepNetworks

Neural Information Processing SystemsFeb-10-2026, 14:14:55 GMT

Given the rich literature on random matrices, it is not surprising to find that the rank of the intermediate representations in unnormalized networks collapses quickly with depth.

artificial intelligence, arxivpreprintarxiv, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models

Sergey Ioffe

Neural Information Processing SystemsNov-21-2025, 12:47:56 GMT

Batch Normalization is quite effective at accelerating and improving the training of deep models. However, its effectiveness diminishes when the training mini-batches are small, or do not consist of independent samples. We hypothesize that this is due to the dependence of model layer inputs on all the examples in the minibatch, and different activations being produced between training and inference. We propose Batch Renormalization, a simple and effective extension to ensure that the training and inference models generate the same outputs that depend on individual examples rather than the entire minibatch. Models trained with Batch Renormalization perform substantially better than batchnorm when training with small or non-i.i.d.

batch renormalization, batchnorm, minibatch, (13 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

905056c1ac1dad141560467e0a99e1cf-Paper.pdf

Neural Information Processing SystemsNov-21-2025, 03:53:45 GMT

Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood.

artificial intelligence, batchnorm, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rethinking Layer-wise Model Merging through Chain of Merges

Buzzega, Pietro, Salami, Riccardo, Porrello, Angelo, Calderara, Simone

arXiv.org Artificial IntelligenceOct-2-2025

Fine-tuning pretrained models has become a standard pathway to achieve state-of-the-art performance across a wide range of domains, leading to a proliferation of task-specific model variants. As the number of such specialized models increases, merging them into a unified model without retraining has become a critical challenge. Existing merging techniques operate at the level of individual layers, thereby overlooking the inter-layer dependencies inherent in deep networks. We show that this simplification leads to distributional mismatches, particularly in methods that rely on intermediate activations, as changes in early layers are not properly propagated to downstream layers during merging. We identify these mismatches as a form of internal covariate shift, comparable to the phenomenon encountered in the initial phases of neural networks training. To address this, we propose Chain of Merges (CoM), a layer-wise merging procedure that sequentially merges weights across layers while sequentially updating activation statistics. By explicitly accounting for inter-layer interactions, CoM mitigates covariate shift and produces a coherent merged model through a series of conditionally optimal updates. Experiments on standard benchmarks demonstrate that CoM achieves state-of-the-art performance.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.21421

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Mask-PINNs: Mitigating Internal Covariate Shift in Physics-Informed Neural Networks

Jiang, Feilong, Hou, Xiaonan, Ye, Jianqiao, Xia, Min

arXiv.org Artificial IntelligenceSep-3-2025

Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs) by embedding physical laws directly into the loss function. However, as a fundamental optimization issue, internal covariate shift (ICS) hinders the stable and effective training of PINNs by disrupting feature distributions and limiting model expressiveness. Unlike standard deep learning tasks, conventional remedies for ICS -- such as Batch Normalization and Layer Normalization -- are not directly applicable to PINNs, as they distort the physical consistency required for reliable PDE solutions. To address this issue, we propose Mask-PINNs, a novel architecture that introduces a learnable mask function to regulate feature distributions while preserving the underlying physical constraints of PINNs. We provide a theoretical analysis showing that the mask suppresses the expansion of feature representations through a carefully designed modulation mechanism. Empirically, we validate the method on multiple PDE benchmarks -- including convection, wave propagation, and Helmholtz equations -- across diverse activation functions. Our results show consistent improvements in prediction accuracy, convergence stability, and robustness. Furthermore, we demonstrate that Mask-PINNs enable the effective use of wider networks, overcoming a key limitation in existing PINN frameworks.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.06331

Country:

Europe (0.46)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Inherent Weight Normalization in Stochastic Neural Networks

Georgios Detorakis, Sourav Dutta, Abhishek Khanna, Matthew Jerry, Suman Datta, Emre Neftci

Neural Information Processing SystemsAug-20-2025, 05:56:43 GMT

Neural Information Processing Systems http://nips.cc/

arxiv preprint arxiv, neural network, nsm, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.05)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Reviews: Revisit Fuzzy Neural Network: Demystifying Batch Normalization and ReLU with Generalized Hamming Network

Neural Information Processing SystemsOct-8-2024, 04:43:02 GMT

The authors use a notion of generalized hamming distance, to shed light on the success of Batch normalization and ReLU units. After reading the paper, I am still very confused about its contribution. The authors claim that generalized hamming distance offers a better view of batch normalization and relus, and explain that in two paragraphs in pages 4,5. The explanation for batch normalization is essentially contained in the following phrase: "It turns out BN is indeed attempting to compensate for deficiencies in neuron outputs with respect to GHD. This surprising observation indeed adheres to our conjecture that an optimized neuron should faithfully measure the GHD between inputs and weights."

demystifying batch normalization and relu, network, revisit fuzzy neural network, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.43)

Add feedback

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models

Sergey Ioffe

Neural Information Processing SystemsOct-4-2024, 05:21:11 GMT

Batch Normalization is quite effective at accelerating and improving the training of deep models. However, its effectiveness diminishes when the training minibatches are small, or do not consist of independent samples. We hypothesize that this is due to the dependence of model layer inputs on all the examples in the minibatch, and different activations being produced between training and inference. We propose Batch Renormalization, a simple and effective extension to ensure that the training and inference models generate the same outputs that depend on individual examples rather than the entire minibatch. Models trained with Batch Renormalization perform substantially better than batchnorm when training with small or non-i.i.d.

batch renormalization, batchnorm, minibatch, (13 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback